Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Finish migration of vars_funs module for Python package #32

Merged
merged 62 commits into from
Dec 4, 2024

Conversation

jeancochrane
Copy link
Contributor

@jeancochrane jeancochrane commented Nov 26, 2024

This PR builds on #31 by defining a vars_recode function in the Python package, which completes the migration of the vars_funs module to Python.

jeancochrane and others added 30 commits November 20, 2024 13:39
@jeancochrane jeancochrane marked this pull request as ready for review November 26, 2024 23:22
@jeancochrane jeancochrane requested a review from a team as a code owner November 26, 2024 23:22
@jeancochrane jeancochrane requested a review from dfsnow November 26, 2024 23:23
Comment on lines 143 to 144
human-readable format. For example, EXT_WALL = 2 will become
EXT_WALL = "Frame + Masonry". Note that the values and their translations
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nitpick: It's wrong in the R docs too, but EXT_WALL = 2 is "Masonry", not "Frame + Masonry".

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops, thanks! Fixed in d135b9a.

Comment on lines 131 to 137
def vars_recode(
data: pd.DataFrame,
cols: list[str] | None = None,
code_type: str = "long",
as_factor: bool = True,
dictionary: pd.DataFrame | None = None,
) -> pd.DataFrame:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd really like to keep the interfaces across the R and Python versions of our major packages the same. Maybe we can rename the R function inputs here and then release a major version as we did with assesspy. While we're at it we can trim out some of the unused functionality of the R version.

python/ccao/vars_funs.py Outdated Show resolved Hide resolved
if dictionary.empty:
raise ValueError("dictionary must be a non-empty pandas DataFrame")

required_columns = {"var_code", "var_value", "var_value_short"}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

issue (blocking): var_name, var_type, and var_data_type should be required as well since they're used below.

Edit: Plus all the var_name_ columns.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That makes sense, I added stricter validation in f5ee577. Note that we check for any column matching the pattern var_name_*, since technically the function doesn't care what the names are, or how many of them there are.

Dictionaries
^^^^^^^^^^^^

Lookups for numeric codes used in the assessment system
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nitpick: Also used for vars_rename, so not just for the numeric code lookup.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, clarified in 2a82c96.

python/tests/test_vars_funs.py Show resolved Hide resolved
@@ -7,7 +7,7 @@

# Load the default variable dictionary
_data_path = importlib.resources.files(ccao.data)
vars_dict = pd.read_csv(str(_data_path / "vars_dict.csv"))
vars_dict = pd.read_csv(str(_data_path / "vars_dict.csv"), dtype=str)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

issue (non-blocking): Won't this prevent matching to numeric values in the original data? i.e. 1 instead of "1".

Comment on lines +190 to +200
{
"input": {
"athena": {
"name": "char_bsmt",
"value": ["1", "3", "4", "5"],
},
"iasworld": {
"name": "bsmt",
"value": ["1", "3", "4", "5"],
},
},
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

issue (blocking): We should add a test for numeric values (and update the function to handle those) or make it clear to the user that vars_recode expect all values to be strings.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good thinking, although I'm going to wait to update anything until we decide on what the expected behavior should be in this thread.

@jeancochrane
Copy link
Contributor Author

@dfsnow This should be ready for another look!

Copy link
Member

@dfsnow dfsnow left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alright, this looks set to go now!

Copy link

codecov bot commented Dec 4, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 100.00%. Comparing base (969dae7) to head (c771664).
Report is 6 commits behind head on master.

Additional details and impacted files
@@            Coverage Diff            @@
##            master       #32   +/-   ##
=========================================
  Coverage   100.00%   100.00%           
=========================================
  Files            6         6           
  Lines          382       382           
=========================================
  Hits           382       382           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@jeancochrane jeancochrane merged commit 2b027d4 into master Dec 4, 2024
17 checks passed
@jeancochrane jeancochrane deleted the jeancochrane/further-python-package-migration branch December 4, 2024 22:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants